Step-Ahead Error Feedback for Distributed Training with Compressed Gradient

نویسندگان

چکیده

Although the distributed machine learning methods can speed up training of large deep neural networks, communication cost has become non-negligible bottleneck to constrain performance. To address this challenge, gradient compression based communication-efficient were designed reduce cost, and more recently local error feedback was incorporated compensate for corresponding performance loss. However, in paper, we will show that a new "gradient mismatch" problem is raised by centralized lead degraded compared with full-precision training. solve critical problem, propose two novel techniques, 1) step ahead 2) averaging, rigorous theoretical analysis. Both our empirical results handle problem. The experimental even train faster common schemes than both regarding epochs without

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Error Exponents for Distributed Detection with Feedback'

We investigate the effects of feedback on a decentralized detection system consisting of N sensors and a detection center. It is assumed that observations are independent and identically distributed across sensors, and that each sensor compresses its observations into a fixed number of quantization levels. We consider two variations on this setup. One entails the transmission of sensor data to ...

متن کامل

Step Ahead

Results: There was no impact of the intervention on change in BMI from baseline to 12 ( 0.272; 95% CI 0.271, 0.782) or 24 months ( 0.276; 95% CI 0.338, 0.890) in intention-to-treat analysis. When intervention exposure (scale 0 to 100) was used as the independent variable, there was a decrease of 0.012 BMI units (95% CI 0.025, 0.001) for each unit increase in intervention participation at the 24...

متن کامل

BMP Signalling: Synergy and Feedback Create a Step Gradient

More than a decade ago, genetic evidence predicted the existence of a Dpp gradient in the early Drosophila embryo. Two recent studies finally reveal Dpp distribution, providing further insights into the mechanism of BMP gradient formation.

متن کامل

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure. The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections....

متن کامل

Error evolution in multi-step ahead streamflow forecasting for the operation of hydropower reservoirs

hydropower reservoirs Georgia Papacharalampous*, Hristos Tyralis, and Demetris Koutsoyiannis Department of Water Resources and Environmental Engineering, School of Civil Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, Greece * Corresponding author, [email protected] Abstract: Multi-step ahead streamflow forecasting is of practical i...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i12.17254